Handling Data Skew in Map Reduce Using Hadoop Libra

نویسنده

  • Lakshmi Priya
چکیده

There are many efficient tools significantly uses Map Reduce applications that assigns data with their corresponding tasks in parallel and distributed data processing. LIBRA symbolizes the lightweight problems of data skew with input data applications that can overlap map and reduce strategies. This is one of the innovative and accurate distribution methods for intermediate data sampling with normal steps of processing data. LIBRA has trivial overheads for output data that balances loads of computing resources. In this paper we propose the method for handling Data Skew in Map Reduce Using Hadoop in LIBRA to show the effectiveness of Hadoop on Web Crawling of Large Datasets form Web Servers. Map Reduce processes huge a set of data efficiently to establish its subsistence. The large job is divided into many small tasks and they are assigned to various nodes to perform parallel processing. Applications and Frameworks of Map Reduce. Straggler Process causes time delay. Data skew refers to the disparity for data assigned to each task, or the existence of inequality in the amount of work required to process such data. Data sets in the real world are often skewed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementation and Analysis of Join Algorithms to handle skew for the Hadoop Map/Reduce Framework

he Map/Reduce framework-a parallel processing paradigm-is widely being used for large scale distributed data processing. Map/Reduce can perform typical relational database operations like selection, aggregation, and projection etc. However, binary relational operators like join, cartesian product, and set operations are difficult to implement with Map/Reduce. Map/Reduce can process homogeneous ...

متن کامل

Effective Data Skew Mitigation

The production of data is expanding at an astonishing pace. The dramatic rise of unstructured data like photos, videos and social media has ushered in a new breed of non-relational databases and which are termed as “Big Data”. In 2012, the amount of information stored worldwide exceeded 2.8 Zetabytes. By 2020, the total amount of data stored is expected to be 50 times larger than today. Big Dat...

متن کامل

Handling partitioning skew in MapReduce using LEEN

MapReduce is emerging as a prominent tool for big data processing. Locality is a key feature in MapReduce that is extensively leveraged in dataintensive cloud system: it avoids network saturation when processing large amount of data by co-allocating computation and data storage — the map phase. However, our studies with Hadoop, a widely used MapReduce implementation, demonstrate that the presen...

متن کامل

A Survey on Partitioning Skew Diminishing Techniques in Hadoop MapReduce Environment

In the era of Big Data, it creates large size of structured and unstructured data. MapReduce is an effective tool for parallel data processing. One significant issue in practical MapReduce applications is data skew: the imbalance in the amount of data assigned to each task. This causes some tasks to take much longer to finish than others and can significantly impact performance. Parallel data p...

متن کامل

Addressing Big Data with Hadoop

Nowadays, a large volume of data from various resources such as social media networks, sensory devices and other information serving devices are produced. This large collection of unstructured, semi structured data is called big data. The conventional databases and data ware houses can’t process this data. So we need new data processing tools. Hadoop addresses this need. Hadoop is an open sourc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016